JAMIA Open
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
ObjectivePatient data repositories often assemble medication data from multiple sources, necessitating standardization prior to analysis. We implemented and evaluated a medication standardization procedure for use with a wide range of pharmacy data inputs across all drug categories, which supports research queries at multiple levels of granularity. MethodsThe GEMINI-RxNorm system automates the use of multiple RxNorm tools in tandem with other datasets to identify drug concepts from pharmacy ord...
Show abstract
ImportanceManual data extraction from genomic lab reports for on-line registries and databases is time-consuming for human resources such as clinical research coordinators. Automated tools, especially LLMs, can address these issues. Efficient and accurate data processing is crucial for building a reliable database. ObjectiveTo streamline the data extraction and curation process for genetic testing lab reports using an LLM-based approach. DesignNine sample molecular lab reports were selected fo...
Show abstract
BackgroundThe Data Management and Sharing (DMS) Policy issued by the National Institutes of Health (NIH) requires most grant applications to include a DMS Plan, detailing data type(s), resources (e.g., data repositories, knowledgebases, portals) for data sharing, and a dissemination timeline. Researchers face challenges navigating the complex data landscape to identify data resources to fulfill the DMS Policy requirements. The National Institute of Allergy and Infectious Diseases (NIAID) aims to...
Show abstract
ObjectiveTo characterize clinical value set issues and identify common patterns of errors. Materials and MethodsWe conducted semi-structured interviews with 26 value set experts and performed root cause analyses of errors identified in electronic health records (EHRs). We also analyzed a random sample of user-reported issues from the Value Set Authority Center (VSAC), developing a categorization scheme for value set errors. Additionally, we audited medication value sets from three sources and a...
Show abstract
BackgroundExisting information resources about medicines and their indications have limited usefulness for health data analytics. The emerging potential of large language models (LLMs) to generate clinically accurate responses presents a novel opportunity to develop a comprehensive knowledge base of medicines and their clinical indications. MethodUnique medications from the English Prescribing Dataset (EPD) were extracted and included in a fine-tuned prompt pipeline using the GPT-4 and MedCAT L...
Show abstract
ObjectiveTo evaluate Phenotype Execution and Modelling Architecture (PhEMA), to express sharable phenotypes using Clinical Query Language (CQL) and intensional SNOMED CT Fast Healthcare Interoperability Resources (FHIR) valuesets, for exemplar chronic disease, sociodemographic risk factor and surveillance phenotypes. MethodWe curated three phenotypes: Type 2 diabetes (T2DM), excessive alcohol use and incident influenza-like illness (ILI) using CQL to define clinical and administrative logic. We...
Show abstract
Medication product names in Swiss electronic health records are heterogeneous and often encode multiple attributes (e.g., ingredient, strength, dose form, packaging) in German free text. This limits interoperability and reduces the utility of ATC codes, which do not uniquely identify products. We compared two workflows for mapping Swiss medication products to RxNorm and RxNorm Extension: (i) an Observational Health Data Sciences and Informatics (OHDSI) USAGI workflow with lexical similarity and ...
Show abstract
ObjectiveWe developed medExtractR, a natural language processing system to extract medication dose and timing information from clinical notes. Our system facilitates creation of medication-specific research datasets from electronic health records. Materials and MethodsWritten using the R programming language, medExtractR combines lexicon dictionaries and regular expression patterns to identify relevant medication information ( drug entities). The system is designed to extract particular medicat...
Show abstract
Extracting patient phenotypes from routinely collected health data (such as Electronic Health Records) requires translating clinically-sound phenotype definitions into queries/computations executable on the underlying data sources by clinical researchers. This requires significant knowledge and skills to deal with heterogeneous and often imperfect data. Translations are time-consuming, error-prone and, most importantly, hard to share and reproduce across different settings. This paper proposes a...
Show abstract
BackgroundA scalable approach for the sharing and reuse of human-readable and computer-executable phenotype definitions can facilitate the reuse of electronic health records for cohort identification and research studies. DescriptionWe developed a tool called Sharephe for the Informatics for Integrating Biology and the Bedside (i2b2) platform. Sharephe consists of a plugin for i2b2 and a cloud-based searchable repository of computable phenotypes, has the functionality to import to and export fr...
Show abstract
ObjectiveTo identify and define a process and framework for biomedical discovery research. Our study aim was to characterize the biomedical discovery lifecycle across data modalities and professional stakeholders involved in biomedical research to address the multiomics data challenges of precision medicine. Materials and MethodsWe recruited fifteen professionals from various biomedical roles and industries to participate in 60-minute semi-structured interviews, which involved an assessment of ...
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWO_ST_ABSObjectiveC_ST_ABSClinical and phenotypic data available to researchers are often found in spreadsheets or bespoke data models. Bridging these to enterprise data warehouses would enable sophisticated analytics and cohort discovery for users of platforms like NHGRIs Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVlL). We combine data mapping methodologies, biomedical ontologies, and large language models (LLMs) to load these data into Inf...
Show abstract
ObjectivesSocial determinants of health (SDoH) are critical drivers of health outcomes but are often under-documented in structured electronic health record data. This study aimed to develop and evaluate scalable methods for extracting seven SDoH domain categories and 23 subcategories from unstructured clinical notes using both rule-based and large language model (LLM)-based approaches. MethodsWe constructed a gold-standard SDoH corpus comprising clinical text segments from 171 patients in the ...
Show abstract
Electronic health record (EHR) data are a rich and invaluable source of real-world clinical information, enabling detailed insights into patient populations, treatment outcomes, and healthcare practices. The availability of large volumes of EHR data are critical for advancing translational research and developing innovative technologies such as artificial intelligence. The Evolve to Next-Gen Accrual to Clinical Trials (ENACT) network, established in 2015 with funding from the National Center for...
Show abstract
ObjectiveGrowing numbers of academic medical centers offer patient cohort discovery tools to their researchers, yet the performance of systems for this use case is not well-understood. The objective of this research was to assess patient-level information retrieval (IR) methods using electronic health records (EHR) for different types of cohort definition retrieval. Materials and MethodsWe developed a test collection consisting of about 100,000 patient records and 56 test topics that characteri...
Show abstract
BackgroundFor accurate medication usage statistics and medication adherence calculations, we need to have an accurate days supply (DS) for each prescription. Unfortunately, often the DS or information needed for calculating the DS is not provided. Therefore, other methods need to be applied to acquire missing values or substituting incorrect values. ObjectiveThe aim of this study is to apply a variety of methods for managing incomplete and missing data to enhance the accuracy of calculating DS ...
Show abstract
High-throughput phenotyping strategies are capable of classifying large volumes of patients. However, translating this data to real world applications is challenging. We have developed GeoPheno, a tool which displays the geospatial prevalences of EHR-based phenotypes in the Veteran population over time. Our flexible tool can display data from a wide array of phenotypes and is integrated with the CIPHER phenotype library, allowing users to view the definitions of the conditions being visualized.
Show abstract
Performance of systems used for patient cohort identification with electronic health record (EHR) data is not well-characterized. The objective of this research was to evaluate factors that might affect information retrieval (IR) methods and to investigate the interplay between commonly used IR approaches and the characteristics of the cohort definition structure. We used an IR test collection containing 56 test patient cohort definitions, 100,000 patient records originating from an academic me...
Show abstract
BackgroundComputable phenotypes are increasingly important tools for patient cohort identification. As part of a study of risk of chronic opioid use after surgery, we used a Resource Description Framework (RDF) triplestore as our computable phenotyping platform, hypothesizing that the unique affordances of triplestores may aid in making complex computable phenotypes more interoperable and reproducible than traditional relational database queries. To identify and model risk for new chronic opioi...
Show abstract
ObjectiveClinical decision support systems (CDSS) have a critical role in improving the quality and safety of health care delivery. CDSS rules direct the behavior of CDSS. However, the CDSS rules have not been routinely shared and reused, and ontology can promote the reusing of CDSS rules. We systematically screened literature to elaborate on the current status of ontology applied in CDSS rule management. MethodsWe searched PubMed, the Association for Computing Machinery (ACM) Digital Library, ...